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INTRODUCTION 

We congratulate Drs. Kang and Schafer (KS hence- 
forth) for a careful and thought-provoking contribu- 
tion to the literature regarding the so-called "dou- 
ble robustness" property, a topic that still engenders 
some confusion and disagreement. The authors' ap- 
proach of focusing on the simplest situation of es- 
timation of the population mean fi of a response y 
when y is not observed on all subjects according to a 
missing at random (MAR) mechanism (equivalently, 
estimation of the mean of a potential outcome in a 
causal model under the assumption of no unmea- 
sured confounders) is commendable, as the funda- 
mental issues can be explored without the distrac- 
tions of the messier notation and considerations re- 
quired in more complicated settings. Indeed, as the 
article demonstrates, this simple setting is sufficient 
to highlight a number of key points. 

As noted eloquently by Molenberghs (2005), in 
regard to how such missing data/causal inference 
problems are best addressed, two "schools" may be 
identified: the "likelihood-oriented" school and the 
"weighting-based" school. As we have emphasized 
previously (Davidian, Tsiatis and Leon, 2005), we 
prefer to view inference from the vantage point of 
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semiparametric theory, focusing on the assumptions 
embedded in the statistical models leading to differ- 
ent "types" of estimators (i.e., "likelihood-oriented" 
or "weighting-based") rather than on the forms of 
the estimators themselves. In this discussion, we hope 
to complement the presentation of the authors by 
elaborating on this point of view. 

Throughout, we use the same notation as in the 
paper. 

SEMIPARAMETRIC THEORY PERSPECTIVE 

As demonstrated by Robins, Rotnitzky and Zhao 
(1994) and Tsiatis (2006), exploiting the relation- 
ship between so-called influence functions and esti- 
mators is a fruitful approach to studying and con- 
trasting the (large-sample) properties of estimators 
for parameters of interest in a statistical model. We 
remind the reader that a statistical model is a class 
of densities that could have generated the observed 
data. Our presentation here is for scalar parameters 
such as /i, but generalizes readily to vector-valued 
parameters. If one restricts attention to estimators 
that are regular (i.e., not "pathological"; see David- 
ian, Tsiatis and Leon, 2005, page 263 and Tsiatis 
2006, pages 26-27), then, for a parameter fj, in a 
parametric or semiparametric statistical model, an 
estimator Jl for fi based on independent and iden- 
tically distributed observed data Zi, i = 1, . . . ,n, is 
said to be asymptotically linear if it satisfies 



n 



1/2 ( 



{Jl-fiQ)=n ^/2^93(zi) -FOp(l) 



i=l 



for ip{z) with E{ip{z)} = and E{ip'^{z)} < oo, where 
/.to is the true value of fi generating the data, and 
expectation is with respect to the true distribution 
of z. The function (p{z) is the influence function of 
the estimator Ji. A regular, asymptotically linear es- 
timator with infiuence function ^{z) is consistent 
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and asymptotically normal with asymptotic vari- 
ance E{ip^{z)}. Thus, there is an inextricable con- 
nection between estimators and influence functions 
in that the asymptotic behavior of an estimator is 
fully determined by its influence function, so that 
it suffices to focus on the influence function when 
discussing an estimator's properties. Many of the 
estimators discussed by KS are regular and asymp- 
totically linear; in the sequel, we refer to regular and 
asymptotically linear estimators as simply "estima- 
tors." 

We capitalize on this connection by considering 
the problem of estimating fi in the setting in KS 
in terms of statistical models that may be assumed 
for the observed data, from which influence func- 
tions corresponding to estimators valid under the 
assumed models may be derived. In the situation 
studied by KS, the "full" data that would ideally be 
observed are (t, x, y); however, as y is unobserved for 
some subjects, the observed data available for anal- 
ysis are z = {t,x,ty). As noted by KS, the MAR as- 
sumption states that y and t are conditionally inde- 
pendent given x; for example, P{t = l\y, x) = P{t = 
l\x). Under this assumption, all joint densities for 
the observed data have the form 



(2) 



p{z) =p{y\xy^^-^'^p{t\x)p{x), 



III. Make no assumptions on the form of p{x), but 
make specific assumptions on p{y\x) and p{t\x), 
namely, that E{y\x) = m{x,P) and P{t = l\x) = 
E{t\x) = 7r(x, a) >e > for all x and some e for 
given functions m{x,f3) and TT{x,a) depending 
on parameters (3 and a. The class of densities 
satisfying these assumptions is Mj D A4jj . 

All of I-III are semiparametric statistical models in 
that some aspects of p{z) are left unspecified. De- 
note by mo{x) the true function E{y\x) and by 7ro(a;) 
the true function P(t = l|rE) = E{t\x) corresponding 
to the true density po{z). 

Semiparametric theory yields the form of all in- 
fluence functions corresponding to estimators for 
under each of the statistical models I-III. As dis- 
cussed in Tsiatis (2006, page 52), loosely speaking, a 
consistent and asymptotically normal estimator for 
H in a statistical model has the property that, for all 
p{z) in the class of densities defined by the model. 



n 



where p{y\x) is the density of y given x, p{t\x) is 
the density of t given x, and p{x) is the marginal 
density of x. Let po{z) be the density in the class of 
densities of form (2) generating the observed data 
(the true joint density). 

One may posit different statistical models by mak- 
ing different assumptions on the components of (2). 
We focus on three such models: 

I. Make no assumptions on the forms of p{x) or 
p{t\x), leaving these entirely unspecified. Make 
a specific assumption on p{y\x), namely, that 
E(y\x) = m{x,l3) for some given function 
m{x,P) depending on parameters P {px 1). De- 
note the class of densities satisfying these as- 
sumptions as ^A[. 
II. Make no assumptions on the forms of p{x) or 
p{y\x). Make a specific assumption on p{t\x) 
that P{t = l\x) = E{t\x) = 7r(x,a) for some given 
function 7r(x, a) depending on parameters a (s x 
1). Here, we also require the assumption that 
P{t = l\x) > e > for all x and some e. Denote 
the class of densities satisfying these assump- 
tions as Ain. 



i/2(/I-/i) ^i?^AA{0,c72(p)}, where — > means con- 
vergence in distribution under the density p(z), and 
(T^(p) is the asymptotic variance of p. under p{z). 

If model I is correct, then mo{x) = m{x,[3) for 
some /?, and it may be shown (e.g., Tsiatis, 2006, 
Section 4.5) that all estimators for /i have influence 
functions of the form 



V{p) 



(3) 



nioix) - /i + ta{x){y - ■mo{x)} 



for arbitrary functions a{x) of x. If model II is cor- 
rect, then 7ro(x) =7r(x,a) for some a, and all esti- 
mators for fi have influence functions of the form 



(4) 



ty 



+ 



t-7ro{x) 



h{x) — fj, 



for arbitrary h{x), which is well known from Robins, 
Rotnitzky and Zhao (1994). If model III is correct, 
then mo{x) = 'm{x,l3) and 7ro(x) = 7r(x,a) for some 
P and a, and influence functions for estimators /2 
have the form 



(5) 



mo{x) - /i + ta{x){y - moix)} 

TTo{x) 



+ 



-h{x) 



for arbitrary a(x) and h[x). Depending on forms of 
m{x,f3) as a function of /? and TT{x,a) as a function 
of a, there will be restrictions on the forms of a{x) 
and h{x); see below. 

We now consider estimators discussed by KS from 
the perspective of influence functions. The regres- 
sion estimator p-oLS in (7) of KS comes about nat- 
urally if one assumes model I is correct. In terms 
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of influence functions, JloLS may be motivated by 
considering the influence function (3) with a{x) = 0, 
as this leads to the estimator ^'^^im{xi, (3) . In 
fact, although KS do not discuss it, the "imputation 
estimator" fliMP = Y.i=i{tiyi + (1 - ti)'m{xi, (3)} 
may be motivated by taking a{x) = 1 in (3). Of 
course, in practice, /3 must be estimated. In general, 
(3) implies that all estimators for /i that are consis- 
tent and asymptotically normal if model I is correct 
must be asymptotically equivalent to an estimator 
of the form 

n 

(6) n~^Yl[m{xi,(3) +tia{xi){yi -m{xi,l3)}], 

1=1 

where /3 is estimated by solving an estimating equa- 
tion E7=itiA{xi,P){yi - m{xi,P)} = for A{x,P) 
(p X 1). Because (3 is estimated, the influence func- 
tion of the estimator (6) with a particular a{x) will 
not be exactly equal to (3) with a{x) = a{x); instead, 
it may be shown that the influence function of (6) 
is of form (3) with a{x) in (3) equal to 

a{x) - E[{7ro{x)a{x) - IjmJ {x , (3o)] 

(7) ■[E{7To{x)A{x,(3o)m]^{x,(3o)}r^ 
■A{x,f3o), 

where mi3{x,f3) is the vector of partial derivatives 
of elements of m{x,f3) with respect to f3, and /3o is 
such that mo{x) =m{x,f3o). 

The IPW estimator jlipw-POP in (3) of KS and 
its variants arise if one assumes model II. In par- 
ticular, Jlipw-POP can be motivated via the influ- 
ence function (4) with h{x) = The estimator 
filPW-NR in (4) of KS follows from (4) with h{x) = 
-E[y{l - tt{x)}]/E[{1 - 7r(x)}]. In fact, if one re- 
stricts h{x) in (4) to be a constant, then, using the 
fact that the expectation of the square of (4) is the 
asymptotic variance of the estimator, one may find 
the "best" such constant minimizing the variance as 
h{x) = -E[y{l - 7T{x)}/7r{x)]/E[{l - 7r(x)}/7r(x)]. 
An estimator based on this idea was given in (10) of 
Lunceford and Davidian (2004, page 2943). In gen- 
eral, as for model I, (4) implies that all estimators 
for fi that are consistent and asymptotically normal 
if model II is correct must be asymptotically equiv- 
alent to an estimator of the form 

(8) n 2^<—. xr + J ^—h{xi)\, 

where a is estimated by solving an equation of the 
form J27=i{'ti ~ '^{^i,<^)}B{xi,a) = for some {s x 



1) B{xi,a), almost always maximum likelihood for 
binary regression. As above, because a is estimated, 
the influence function of (8) is equal to (4) with h{x) 
equal to 

h{x) - E[TT'^{x,ao){'mQ{x) + h{x)}/7ro{x)] 

(9) ■[E{B{x,ao)7Tlix,ao)}]-' 

■ B{x,ao)TTo{x), 

where 7r„(x,a) is the vector of partial derivatives of 
elements of TT{x,a) with respect to a, and ao satis- 
fies '7Tq{x) =7r(x,ao)- 

Doubly robust (DR) estimators are estimators that 
are consistent and asymptotically normal for mod- 
els in ^AI U ^A^, that is, under the assumptions of 
model I or model II. When the true density po{z) G 
Mi r\ Ain, then the influence function of any such 
DR estimator must be equal to (3) with a{x) = 
l/7ro(a;) or, equivalently, equal to (4) with h(x) = 
—mo{x). Accordingly, when pq{z) € Al/R A^//, that 
is, both models have been specified correctly, all 
such DR estimators will have the same asymptotic 
variance. This also implies that, if both models are 
correctly specified, the asymptotic properties of the 
estimator do not depend on the methods used to 
estimate /? and a. 

KS discuss strategies for constructing DR esti- 
mators, and they present several specific examples: 
V-BC-OLS in their equation (8); the estimators be- 
low (8) using POP or NR weights, which we denote 
as p-BC-POP and JIbc-nr, respectively; the estimator 
flwLS in their equation (10); p-w-cov in their equation 
(12); and a version of Jln-cov equal to the estima- 
tor proposed by Scharfstein, Rotnitzky and Robins 
(1999) and Bang and Robins (2005), which we de- 
note as flsRR- The results for these estimators under 
the "Correct-Correct" scenarios {MinMu) in Ta- 
bles 5-8 of KS are consistent with the asymptotic 
properties above. We note that JI-k-cov is not DR un- 
der Mi UMii because of the additional assumption 
that the mean of y given tt must be equal to a lin- 
ear combination of basis functions in tt. Making this 
additional assumption may not be unreasonable in 
practice; however, strictly speaking, it takes Jln-cov 
outside the class of DR estimators discussed here, 
and hence we do not consider it in the remainder of 
this section. However, JlsRR is still in this class. 

KS suggest that a characteristic distinguishing the 
performance of DR estimators is whether or not 
the estimator is within or outside the augmented 
inverse-probability weighted (AIPW) class. We find 
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this distinction artificial, as all of the above estima- 
tors JlBC-OLS, P-BC-POP, fisC-NR, V-WLS and 'jlsRR 

can be expressed in an AIPW form. Namely, all of 
these estimators are algebraically exactly of the forni 
(8) with h{xi) replaced by a term —j — m{xi, (3), 
where ^bc-ols = IWLS = ISRR = 0, 

IBC-POP 

^ n~^T,i=i(ti/Tri){yi - fhj) 
^^^^ n-^EUti/^r 

IBC-NR 

_ n"^T,i=i{ii{l-T^i)/Tfi){yi-fhi) 

where we write tTj = 7r(xi,S) and fhi = m{xi,(3) for 
brevity. For 'jlwLS and p-sRRi this identity follows 
from the fact that J27=i ^{Vi ~ ^i) = 0) which for 
J^WLS holds because KS restrict to m{x,P) = x'^P, 
with x including a constant term. Thus, we con- 
tend that issues of performance under A4i U ^AJJ 
are not linked to whether or not a DR estimator 
is AIPW, but, rather, are a consequence of forms 
of the influence functions of estimators under Aij 
or Mil- In particular, under model II, it follows 
that the above estimators have influence functions 
of the form (4) with h{x) equal to (9) with h{x) = 
—{7* + m{x,(3*)}, where 7* and /?* are the limits in 
probability of 7 and /?, respectively. Thus, features 
determining performance of these estimators when 
model II is correct are how close 7* -|- m(x, /?*) is 
to mo(x) and how a is estimated, where maximum 
likelihood is the optimal choice. In fact, this per- 
spective reveals that, for fixed m{x,f3), using ideas 
similar to those in Tan (2006), the optimal choice 
of 7 is as in ^bc-nr with — vfj)/7fj replaced by 

ii(l-7f,)/7f2. 

Similarly, under model I, the influence functions 
of these estimators are of the form (3) with a{x) 
equal to (7) with a{x) = iPi/tt{x, a*) + V21 where a* 
is the limit in probability of S and 1^1 = 1 and ip2 = 
for JIbc-ols, P-wls and JIsrr; V'l = l/-E'{vro(x)/7r(x, 
a*)} and ^2 = for Pbc-pop'-, and ^1 and ^2 for 
pBC-NR are more complicated expectations involv- 
ing 7ro(x) and 7r(x,a*). Thus, under model I, fea- 
tures determining performance of these estimators 
are the form of a(x) and how /? is estimated through 
the choice of A{x,j3). 

We may interpret some of the results in Tables 
5, 6 and 8 of KS in light of these observations. Un- 
der the "vr-model Correct-y-model Incorrect" sce- 
nario {Mnt^Mj), pBC-OLS, V-WLS and JIsrr show 



some nontrivial differences in performance, which, 
from above, are likely attributable to differences in 
m{x,f3*). Under the "7r-model Incorrect-y-model 
Correct" {Mj OM'jj), all three estimators share the 
same a{x) but use different methods to estimate f3, 
so that any differences are dictated entirely by the 
choice of A{x, f3). The poor performance of lisRR can 
be understood from this perspective: for this es- 
timator is actually (3 in the model m{x,f3) used by 
the other two estimators concatenated by an addi- 
tional element, the coefficient of vr~^. The A{x,(3) 
for JlsRR thus involves a design matrix that is un- 
stable for small yfj, consistent with the comment of 
KS at the end of their Section 3. 

In summary, we believe that studying the perfor- 
mance of estimators via their influence functions can 
provide useful insights. Our preceding remarks re- 
fer to large-sample performance, which depends di- 
rectly on the influence function. Estimators with the 
same influence function can exhibit different finite- 
sample properties. It may be possible via higher- 
order expansions to gain an understanding of some 
of this behavior; to the best of our knowledge, this 
is an open question. 

BOTH MODELS INCORRECT 

The developments in the previous section are rel- 
evant in A4i U A4ij. Key themes of KS are perfor- 
mance of DR and other estimators outside this class; 
that is, when both the models n{x,a) and m(x,/3) 
are incorrectly specified, and choice of estimator un- 
der these circumstances. 

One way to study performance in this situation is 
through simulation. KS have devised a very inter- 
esting and instructive specific simulation scenario 
that highlights some important features of various 
estimators. In particular, the KS scenario empha- 
sizes the difficulties encountered with some of the 
DR estimators when 7r(xj,S) is small for some Xj. 
Indeed, in our experience, poor performance of DR 
and IPW estimators in practice can result from few 
small 7r(xj,S). When there are small 7r{xi,a), as 
noted KS, responses are not observed for some por- 
tion of the x space. Consequently, estimators like 
UoLS rely on extrapolation into that part of the x 
space. KS have constructed a scenario where fail- 
ure to observe y in a portion of the x space can 
wreak havoc on some estimators that make use of 
the TT(xi,a) but has minimal impact on the qual- 
ity of extrapolations for these x based on m{x,f3). 
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One could equally well build a scenario where the 
X for which y is unobserved are highly influential 
for the regression m[x,l3) and hence could result in 
deleterious performance of JIqls- We thus reiterate 
the remark of KS that, although simulations can be 
illuminating, they cannot yield broadly applicable 
conclusions. 

Given this, we offer some thoughts on other strate- 
gies for deriving estimators that may have some ro- 
bustness properties under the foregoing conditions, 
that is, offer good performance outside A^/ U A^//. 
One approach may be to search outside the class 
of DR estimators valid under Aij L) Aiij. For ex- 
ample, as suggested by the simulations of KS, esti- 
mators in the spirit of Jln-cov, which impose addi- 
tional assumptions rendering them DR in the strict 
sense only in a subset of Al/ U A^//, may compensate 
for this restriction by yielding more robust perfor- 
mance outside A4i U A4]j; further study along these 
lines would be interesting. An alternative tactic for 
searching outside M.i U Ain may be to consider the 
form of influence functions (5) for estimators valid 
under MjCiAdij . For instance, a "hybrid" estimator 
of the form 



n 

1=1 



m{xi,P)I{TT{xi,a) < 6} 

UVi , ti - 7r(xi,S) 



+ 



+ 



'rT{xi,a) TT{xi,a) 



h{xi) 



I{Tr{xi,a) > S} 



for 5 small, may take advantage of the desirable 
properties of both JIqls and DR estimators. 

A second possible strategy for identifying robust 
estimators arises from the following observation. Con- 
sider the estimator 



(11) 



n 



tiVi ti - 7r{xi) 



7r{xi) 7r{xi) 



m{xi,P) 



If 7r{xi) = 7r(xi,S), then (11) yields one form of a DR 
estimator. If 7r(xi) = 1, then (11) results in the impu- 
tation estimator. If 7r(a;i) = oo, (11) reduces to p-oLS- 
This suggests that it may be possible to develop es- 
timators based on alternative choices of 7r(xj) that 
may have good robustness properties. For exam- 
ple, a method for obtaining estimators Tr{xi,a) that 
shrinks these toward a common value may prove 
fruitful. The suggestion of KS to move away from 
logistic regression models for 7r{xi,a) is in a similar 
spirit. 



Finally, we note that yet another approach to 
developing estimators would be to start with the 
premise that one make no parametric assumption on 
the forms of E(y\x) and E{t\x) beyond some mild 
smoothness conditions. Here, it is likely that first- 
order asymptotic theory, as in the previous section, 
may no longer be applicable. It may be necessary to 
use higher-order asymptotic theory to make progress 
in this direction; see, for example, Robins and van 
der Vaart (2006). 

CONCLUDING REMARKS 

We again compliment the authors for their thought- 
ful and insightful article, and we appreciate the op- 
portunity to offer our perspectives on this important 
problem. We look forward to new methodological 
developments that may overcome some of the chal- 
lenges brought into focus by KS in their article. 
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